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Abstract 

We study the computational complexity of a very basic problem, namely that of finding solutions 
to a very large set of random linear equations in a finite Galois Field modulo q. Using tools from 
statistical mechanics we are able to identify phase transitions in the structure of the solution space 
and to connect them to changes in performance of a global algorithm, namely Gaussian elimina- 
tion. Crossing phase boundaries produces a dramatic increase in memory and CPU requirements 
necessary to the algorithms. In turn, this causes the saturation of the upper bounds for the running 
time. We illustrate the results on the specific problem of integer factorization, which is of central 
interest for deciphering messages encrypted with the RSA cryptosystem. 
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I. INTRODUCTION 



The methods and concepts of statistical physics of disordered systems constitute a very 
useful tool for the understanding of the onset of computational complexity in randomly 
generated hard combinatorial problems. Once the optimization problems are translated into 
zero temperature spin glass problems, one may study the geometrical changes in the space of 
solutions as symmetry breaking phenomena. In this context one may view the exponential 
regimes of randomized search algorithms as out-of-equilibrium phases of stochastic processes. 

However, combinatorial problems are not always exponentially hard: Problems that can 
be solved in polynomial time, even in their worst-case realizations compose the so called 
Polynomial (P) class Jjj]. Such problems are often of great practical relevance and are 
tackled using large scale computations. Examples can be found in all disciplines: In physics, 
just to make one example, one may study ground states of 2D spin glass like Hamiltonians 
resorting to a polynomial max-cut algorithm ||. The major application are obviously found 
in engineering: Examples are design problems (finite elements methods), control theory 
(convex optimization), coding theory (parity check equations) and cryptography (integer 
factorization) . 

Due to the practical relevance of the problems and to the typically large number of 
variables used for their encoding, that is the size of the problems, it is of basic interest to 
look at the fine structure of the class P in order to concretely optimize the computational 
strategies. For instance, in error correcting codes it is crucial to have algorithms that 
converge in linear time with respect to the number of encoded bits, any power larger than 
one being considered of no practical interest. 

Quite in general, the trade-off between time and memory resources is the guiding crite- 
rion which selects the algorithms used in real-world applications. Roughly speaking polyno- 
mial algorithms can be divided in different groups depending on the solving strategy they 
implement. The main groups are local algorithms (e.g. greedy/gradient methods), global 
algorithms (e.g. Gaussian elimination or Fourier transforms methods), iterative algorithms 
(e.g. Lanczos method) and parallel algorithms. See Ref. || for a basic introduction to the 
subject. 

In what follows we shall study a prototype problem of the P class, that is the problem of 
solving large and random sparse systems in some Galois field GF(q). Working in GF(g) is 
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completely equivalent to perform any operation modulo q. 

Firstly, we give a precise analysis of the computational features for non-trivial ensembles 
of random instances. By a statistical mechanics study, we look into the - symmetry break- 
ing - geometrical structure of the space of solution thereby providing an explanation for the 
changes in the power law behavior observed in different algorithms. Moreover, we are able 
to predict and explain in terms of clustering of solutions, the memory catastrophe found in 
global algorithms such as Gaussian elimination. Such an effect seriously hampers applica- 
tion of this sort of global algorithms in many circumstances, one example being symbolic 
manipulations. This memory catastrophe induce in turn an even more dramatic increase in 
CPU time, which make large problems unaffordable above the dynamical threshold 7^ (see 
below for its definition). 

Secondly, we consider a specific "real-world" application, namely the Integer Factoriza- 
tion problem used in RSA public key cryptography ||J]. By a non-trivial mapping of the 
factoring problem on a sparse linear system modulo 2, endowed with a quite peculiar sta- 
tistical distribution of matrix elements, we analyze which are the characteristic geometrical 
properties of solutions that are responsible for the usage of specific algorithms and constitute 
the possible bottleneck for the near future. 

Interestingly enough, the changes in both time or memory requirements during the solu- 
tion process of sparse systems can be interpreted in physical terms as a dynamical transition 
at which the phase space of the associated physical systems becomes split into an exponential 
number of ergodic components. While it is to be expected that local algorithms get stuck by 
local minima at such phase boundary, it is less obvious to predict which is the counterpart 
of the dynamical transition in global algorithms, for which polynomial time convergence is 
guaranteed even for the hardest instances. Indeed the dynamical transition manifests itself 
as a phase transition in the computational requirements which in turn leads to a slowing 
down phenomenon that saturates the upper bound for the convergence time. Such a change 
of scale in memory requirements constitute a serious problem for hardware implementations 
of large scale simulations. 

Hopefully, the paper is written in a style accessible to a cross disciplinary audience. 
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II. RANDOM LINEAR SYSTEMS IN GF(2): RIGOROUS RESULTS AND STA- 
TISTICAL MECHANICS ANALYSIS 



The theory of random equations in finite fields is shared by probability, combinatorics 
and algebra [H. 

For the sake of simplicity we limit our statistical mechanics analysis to GF(2) rather than 
GF(g), the extension to q > 2 being straightforward though technically involved. 

As is well known in the context of error correcting codes ||, solving a sparse linear system 
modulo 2 is equivalent to finding the zero temperature ground states of a class of multiple 
degree interactions p-spin models on diluted random graphs. 

Let us consider a random linear system in GF(2) in the form Ax = y mod 2, where A is 
a 0-1 matrix of dimension M x N. For each of its specific choices A can be interpreted as 
the contact matrix of a particular random (hyper-)graph belonging to a specific ensemble. 
The class of random matrices we shall deal with are defined by the fraction of rows with 
k non zero elements. The latter are placed uniformly at random within each row. 

We focus on matrices that lead to graphs with an average connectivity value (k) = J2k a kk 
finite and much less than both M and N. We are interested to the limit of very large matrices, 
where we can assume N, M —>■ oo with a finite ratio 7 = M/N. 

This is the regime in which a study of the computational cost is important in that it 
applies directly to large scale computations. In the limit N, M — > 00 average quantities 
characterizing the system (e.g. the average fraction of violated equations) are known to be 
equal to the most probable values (i.e. their probability distribution is strongly peaked [0) 
and therefore single random large systems behave as the average over the ensemble. 

We will always assume a\ = at the beginning, since rows with a single one corresponds 
to trivial equations which can be removed a priori from the set. 

The equivalence between linear systems and spin models is quite straightforward. We 
start from a set of linear equations in GF(2), Ax = y, and we build up a spin Hamiltonian 
whose ground state energy E gs counts the minimal number of unsatisfied equations. In the 
case where E gs = 0, ground state configurations will correspond to solutions of the original 
set of linear equations and the zero-temperature entropy will count the number of such 
solutions. 

The construction is done as follows: For every equation, labelled by i G [1 . . . M], let us 
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define the set of variables x entering equation i as 



v(i)^{je[l...N]:A tJ = l} 



(1) 



With the transformation Sj = (— l) Xj and Jj = (— l) Vi , we have that every equation can be 
converted in a term of the Hamiltonian through 



N 



J2 AjXj = yi & 



y% II s i = J i 



(2) 



3 = 1 j£v{i) 

where the multi-spin interaction contain at least 2 spins since we set ai = 0. Then the 
Hamiltonian 



H 



M 



(3) 



M-j2 j i n *i 

i=l j'eu(i) 

fits the above requirements and can be used in the analytical treatment. 

A better form for the above Hamiltonian can be obtained grouping together fc-spin terms 
with the same k, that is 



H 



k ii<«2<...<«fc 



(4) 



where Sj = ±1 are Ising spins and the couplings Ji 1 i 2 ...i ht are i.i.d. quenched random variables 
taking values in {0, ±1}. The total number of interactions, that is of terms with J ^ 0, is 
M, and the energy is zero if and only if all the interactions are satisfied. For each unsatisfied 
interaction the energy increases by 1. 

The fraction of interactions of fc-spin kind is and thus the probability of having 
Jhi 2 ...i k 7^ equals a k M/(^j ~ ^yakk\/N k ~ 1 , while the sign of Ji x i 2 ...i k depends on the 
probability distribution of the components of y, 



P\Jiii2—ik 



N k-i 



2— Ik/ 



' N k-i 



P S(J ili2 ... ik -l) + (l~p) S(J ili2 ... ik + l) , (5) 



where p G [0, 1] controls the fraction of zeros in y. As long as the system admits at least one 
solution, it can always be brought by a gauge transformation in the form with p = 1 <^> y = 0. 
This corresponds to have positive or null couplings only, like in a diluted ferromagnetic 
model. 

In order to make a connection between the behavior of solving algorithms and the struc- 
ture of the matrix A, we study the geometrical properties of the space of solution, i.e. ground 
states of (£|), as a function of 7 for non-trivial choices of {a k }. We may have access to the 



structure of such a space by just performing the T = statistical mechanics analysis of the 
spin glass model, with control parameter 7. 

For 7 large enough, at say j c , the system of equations becomes over-determined and some 
of the equations can no longer be satisfied. This fact is reflected in the ground state energy 
of the associated spin glass model becoming positive. The interesting aspect of the problem 
is that, under proper conditions, there appears a clustering phenomenon with macroscopic 
algorithmic consequences at some intermediate value < 7 = 7^ < j c . We will focus our 
attention on the latter transition, thus assuming a priori that at least one solution always 
exist. This allow us to fix y = hereafter. 

The complete picture of the typical structure of the solution space can be obtained 
through a replica calculation, following a well tested scheme in diluted systems, as in || 
and |J. The results of such a calculations have been recently confirmed by a rigorous math- 
ematical derivation |10|, [TTH . We avoid here to repeat standard calculations already presented 
in m H and extensively reviewed in [|TJ], and we directly present the results. 

Due to the zero energy condition (E gs = for 7 < 7 C ), the dominance of thermodynamical 
states is purely to be determined in entropic terms. Defining So (7) as the logarithm of the 
number of solutions to Ax = divided by N, we have that 



So(7) = S(m, 7 ) = log(2) 
where m solves 



1 - m)[l - log(l - m)\ - 7^ a k (l - vi 

k>2 



k \ 



(6) 



G{m) = 1 - m - e ~ J ^>2 kakmk ~ 1 = . (7) 

When more than one solution to Eq.(^) exist, the one maximazing S(m, 7) must be chosen. 

At fixed {afc}, one can study the phase diagram as a function of 7. At low enough 7, 
Eq. (0) has only the trivial solution m = and the system is paramagnetic with entropy 
S(0, 7) = log(2) (1 — 7). Typically a non trivial magnetized solution for the order parameter, 
m* > 0, appears at a value 7^ such that 

dG(m 



G(m*) = and 



dm 

This solution becomes entropically favored at a value 7 C found solving 



= . (8) 

m=m* 



S(0,7c) = S(m*, 7c ) . (9) 
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The crucial observation is the following. At 7^, together with the magnetized solution, 
there appear other spin glass solutions to the saddle-point equation. In particular, it can be 
shown [[It], [13] that the difference between the paramagnetic and the ferromagnetic entropies, 



£( 7 ) = S(0, 7 )-S(m*, 7 ) , (10) 
gives the configurational entropy of the problem, that is the number of clusters of solu- 



tions [16 . There exist exp[£(7)iV] well separated clusters [Hamming distances ~ O(N)], 
each one containing a number exp[S(m*, 7 )iV] of closed solutions [Hamming distances 
-Oil)]. 

This clusterizations has two main consequences. Local algorithms for finding solutions 
running in linear time in N stop converging ||: this is the typical situation for greedy 
algorithm which get stuck in one of the most numerous local minima at a positive energy. 

Global algorithms, which are guaranteed to converge in polynomial time, need to keep 
track along computation of this complex structure of solutions and a memory linear in N 
turns out to be insufficient, as we will show below. 

The simple cases {a>2 = 1; — 0} and {03 = 1; = 0} are particularly illuminating 
. Self consistency equations for the order parameter and for the ground state entropy can 
be immediately retrieved from the general formulas ([I]) and (Q). 

The first case represents a simple Ising spin model on a random graph. The analysis of 
thermodynamic phases shows a trivial paramagnetic region at low 7, followed by a second 
order phase transition at 7^ = 7 C = 1/2 where both the ground state energy and the 
magnetization become positive in a continuous way. From the linear system point of view, 
working with sparse equations with only 2 variables per row is always easy, i.e. algorithms 
have no slowing down. The solving process progressively fixes variables and smoothly goes 
on until a complete solution has been found. Indeed, fixing one variable immediately fixes 
the other one within the equation, and this goes on in a cascade process that prevents the 
accumulation of too long symbolic memory-taking expressions. 



The second case of the 3-spin model was tackled in full detail in || |T3). It shares the 
general characteristics of all models without, or at most with a very small fraction of, 2-spin 
terms. In this case a gap between 7^ and 7 C opens up and a non vanishing configurational 
entropy E appears there. At 7 C the magnetization jumps to a finite value. 
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For a general choice of {a^}, the configurational entropy reads 



E( T ) = log(2) 



1-fl 



m 



)[1 - log(l -m)]+jJ2 a k mk 



k>2 



(11) 



where m is the largest solution to Eq. (|7|). As discussed in Ref. [|T3J, the values of 7^ and 7 C 
are found as the points where £(7) first appears with a non zero value and where it reaches 
zero again. 

The algorithmic consequences of having £(7) > have been already exposed in 
Refs. P, ^3| : For 7 > 7^ a glassy state with positive energy arises, which traps any local 



dynamics, preventing it to converge towards the ground state of zero energy. We conjecture 
the counterpart on global algorithms, such as Gaussian elimination, to be that the resolution 
time increases with N faster than linear. 

In the next section we will check the above conjecture with two different Gaussian elim- 
ination algorithms, none of which is able to solve the system in linear time for 7 > 7^. 



III. ALGORITHMS BEHAVIOUR 



In this section we analyze the performances of a couple of different 'Gaussian elimination' 
algorithms, their difference being in the order equations are solved. We will measure the 
number of operations and the size of the memory required for the solution of a set of linear 
equations, that is the complexity for finding all solutions to Ax = y. 

We will see that, for a generic ensemble of random problems, any algorithm undergoes an 
easy/hard transition at a certain 7 value, which can not be pushed beyond the dynamical 
transition threshold 7^ In this context we call easy such problems which are solvable with 
a CPU-time and memory of order N, and hard those requiring resources scaling with N a , 
where a > 1. 

Given a set of M linear equations in N variables, Gaussian elimination proceeds as 
follows [for concreteness we will always work in GF(2)]: At each step, it takes an equation, 
e.g. X\ + Xi + X3 — yi, solves it with respect to a variable, e.g. x\ — X2 + £3 + yi, and then 
it substitutes variable x% with the expression £2 + ^3 + Vi in all the equations still unsolved. 
This procedure gives all the solutions to any set of linear equations in, at most, 0(N 3 ) steps 
and using 0(N 2 ) memory. Nevertheless this bounds only holds in the worst case, namely 
when the matrix A is dense. Very often, in actual applications, the matrix is sparse and the 
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FIG. 1: Typical shape of the A t matrix after tN steps of Gaussian elimination. 

algorithm is faster. We define sparse a matrix with O(N) ones and dense that with 0(N 2 ) 
ones. 

In order to analyze the computational complexity of this problem, and its connections 
to phase transitions, we focus on a specific ensemble of random problems, generalizations 
to other ensembles being straightforward. We choose sets of M — 'jN linear equations, 
each one containing exactly k = 3 of the N variables, taking values in GF(2). Thus the 
connectivity of a variable, defined as the number of equations this variable enters in, takes 
values from a Poissonian distribution of mean 37. 

For very large N, that is in the thermodynamical limit, we are interested in how the 
complexity changes with 7. Moreover, for a fixed 7 such that the problem is hard, we would 
like to know when (in terms of the running-time t) and why the algorithm becomes slower 
and slower. 

The running-time t is measured as the number of equations already solved, normalized 
by N, and thus takes values in [0,7]. A t is the matrix representing the set of equations 
after tN steps, and it has the form shown in Fig. |l|. See Fig. |] for the actual shape of 
A t in a specific case with 1024 equations in 1024 variables. For ease of simplicity, we have 
reordered the variables and the equations of the system, such that, at the i-th step, we solve 
the i-th equation with respect to Xt. With this choice the left part of the matrix A t has 
ones on the diagonal and zeros below. The right part can be naturally divided in an upper 
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t = 0.0 



t = 0.3 



t = 0.7 



FIG. 2: The evolution of the At matrix for a specific 1024 x 1024 random system. Every dot 
corresponds to a 1 entry. 

part U and a lower one L. The density of ones in the L matrix — let us call it p(t; 7) - 
is uniform and depends on the initial 7, the time t and the algorithm used for solving the 
linear system. The density of ones in the U part is not uniform and varies from row to row, 
as shown in Fig. [I] with gray tones. For continuity reasons the density at the m-th row of 
U is exactly p(m/N; / ~f). Then U is sparse or dense depending on whether L is. Defining 
k(t;j) = p(t;j)N(l — t) the average number of ones per row in L, we have that a sparse 
(resp. dense) matrix corresponds to having a finite k (resp. p). 

At each time step, the number of operations required are directly related to the density 
of the matrix A t and thus to that of L. More specifically, solving with respect to the variable 
in the upper left corner of L, the number of operations is proportional to the number of 
ones in the first row of L, i.e. k(t; 7), times the number of rows of L having a one in the first 
column, i.e. p(t; 7)^(7 — t), and thus equals 

k{t- 1 )p{t- 1 )N{ 1 -t) = k 2l -^- t =N 2 p 2 { 1 -t){l-t) . (12) 

Then, if the matrix L is sparse a finite number of operations per step is enough, while 0(N 2 ) 
operations are required when L is dense. Integrating over time t G [0,7], we have that the 
total complexity is given by 

N rj^k 2 (t- n )dt = N 3 f( 7 -0(l-0p 2 (*;7)^ • (13) 
Jo 1 — t Jo 
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Since the function p(t; 7) is continuous in t, we conclude that 
p(t;j)<xl/N 



k(t; 7) finite 

p{t; 7) finite 
k(t;j) oc N 



for all t G [0, 7] 



> & Pmax(l) = 0^ 



for some t G [0, 7] > <^> p m ax{l) > <^> 



where 



Pmax (7) 



lim max p(t; 7) 

AT— ooie[0,7] 



CPU time oc JV 
Memory oc iV 

CPU time oc N 3 
Memory oc A^ 2 



(14) 
(15) 

(16) 



is the order parameter signaling the onset of the hard regime. 

Having found the relation between the density of ones in L and the computational com- 
plexity we are interested in, we can now run the algorithms and measure the density p(t; 7). 
The easy/hard transition should manifest itself with p ma x(l) becoming different from zero. 



A. Simplest Gaussian elimination 

Let us start with the simplest algorithm, which solves the equations in the same (random) 
order they appear in the set and with respect to a randomly chosen variable. In this very 
simple case, one can easily show that the complexity for solving a set of linear equations 
with initial parameter 7 = 70 is exactly the same as for solving a larger system with 7 > 70 
up to time t = 70 . For this reason, in this case the function pit] 7) does not depend on 7 
and can be calculated once for all the relevant 7 values. 

Moreover, it is known || that this algorithm, in the limit of very large N, keeps the 
matrix sparse for all 7 < 2/3. 

In Fig. [3] we show the function pit) for many large iV values. The dotted-dashed line is a 
guide to the eyes and it should not be too much different from the thermodynamical limit: 
It goes through the two points (7 = 2/3 and 7 = 0.918) where p{t) must vanish and coincide 
with numerical data in the region, where data for different sizes seem to be quite close to 
the asymptotic shape. 

In the thermodynamical limit, the algorithm keeps the matrix sparse for times t < 2/3 and 
so it undergoes an easy/hard transition at 7 = 2/3: As long as 7 < 2/3, pmaxil) = 0, while 
Pmaxil) > for 7 > 2/3. As we will see below the location of the transition depends on the 
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FIG. 3: Density of ones in the L matrix during the solving process with the simplest Gaussian 
elimination algorithm. The vertical bar marks the analytical critical point 7 C = 0.918. 

algorithm used and, in this case, does not correspond to any underlying thermodynamical 
transition. 

We note en passant that the 7 value where the L matrix becomes sparse again seems 
to correspond to the critical point 7 C = 0.918 || 10, [ll|] (marked with a vertical line in 
Fig. |^). An explanation to this observation will be given in a forthcoming publication. It 
implies that the value of the critical point 7 C , which is relevant e.g. in the XORSAT model 
fl4| in theoretical computer science, could be obtained also by solving differential equations 
for p{t). 



B. Smart Gaussian elimination 



Now we turn to a more clever Gaussian elimination algorithm, which works as follows: 
At each time step, it chooses the variable x having the smallest connectivity in L, i.e. that 
corresponding to the less dense column of L, and solves with respect to x any of the equations 
where x enters in. 

Clearly, in this case, the dynamics and thus the density of ones in L depend on the initial 
7 value: A smaller 7 implies that for a longer time we can choose variables of connectivity 1, 
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FIG. 4: Density of ones in the L matrix during the solving process with a smart Gaussian elimina- 
tion algorithm (N = 8192). Inset: Zoom on the low-density part (with a different normalization). 

which do not increase the average number of ones per row in L. It can be rigorously shown 
flO| ) |rT| that this procedure keeps the density of the L matrix constant, p(t; 7) = p(0; 7), for 
times smaller than t* = 7(1 — m 3 ), where m is the largest solution to 1 — m = exp(— 37m 2 ). 
The last equation is exactly Eq. (^) with {a 3 = 1, a k ^ 3 = 0}. 

Running the algorithm for different 7 values we obtain the densities reported in the main 
panel of Fig. [|. For 7 < 7^ = 0.818 the density remains 0(1/N) all along the run, while for 
7 > 7^ there is a time when the density becomes finite and the problem hard to handle. 

In order to better show what happens around t*, we have plotted in the inset of Fig. [|the 
mean number of ones per row, k(t). It is clear that for 7 < 7d this number remains constant, 
since one can solve the system choosing only variables of connectivity 1, not altering the L 
matrix. On the contrary, for 7 > 7^ there is a time £*(7) when variables of connectivity 
1 terminate, and the algorithm has to start making substitutions in L, thus increasing the 
density of ones. 

Then 7^ marks the onset of computational hardness, both in memory and CPU time. One 
may object that also this value for the easy/hard transition may depend on the particular 
algorithm. Note, however, that a completely different linear algorithm described in Ref. || 
(which firstly works with high-connectivity variables) seems to work up to 7^. Moreover, as 
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seen in the previous section, we have analytically found that at 7^ a transition takes place, 
which drastically changes the structure of the solutions space, and so we argue that any 



algorithm running in linear time can converge only up to 7^. Indeed is shown in [ID] that 
solutions spontaneously form clusters for 7 > 7^ and this particular structure requires a 
larger memory to be stored. 



IV. THE RSA CRYPTOSYSTEM AND FACTORIZATION 

In this section we shall validate the above scenario on a concrete application, namely 
integer factorization problems arising in the RSA cryptosystem. Such problems allow for 
a non-trivial mapping onto huge linear systems in GF(2) with a rather peculiar structure 
of the underlying contact matrix. In order to be as self-contained as possible, we firstly 
give a short review of the problem and the methodology (a detailed description of the RSA 
cryptosystem can be found in ||). 

The only known method for breaking RSA implies factorization of the private key, which 
consists in a natural number which is the product of two big prime numbers, n = p ■ q, 
with p and q approximately of the same size ~ yfn. Keys currently used in applications are 
numbers n ranging from 1024 bits (309 decimal digits) to 2048 bits (617 digits) length. 

The first attempt at a massive parallel factorization was the RSA129 (129 digits, 428 
bits) challenge, solved in 1994 with the quadratic sieve (QS) algorithm. More recently, 
in August, 1999 the RSA155 challenged was solved using the general number field sieve 
(GNFS) algorithm. This has forced to abandon the 512-bit (155 digits) length for sensitive 
information security. 

There are now several sub-exponential algorithms for solving the factorization problem, 
the faster of which is GNFS. QS and GNFS share the same structure, consisting of two 
phases: a first one in which a big (the size depending mostly on the size of n) linear system 
in GF(2) is produced, and a second one in which this system is solved. 

Although the first phase is definitely more costly, the solving phase (which affect this 
paper) takes a respectable part of the total time and memory requirement. Especially as 
numbers get bigger this becomes a limitation, because the fastest solving methods used 
employ a sole workstation, with the consequent memory restriction. Moreover, in recent 
factorizations a new filtering phase has been placed between the previous two, in which 
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pieces of the system (specifically columns of the {0, 1}— matrix) get discarded in order to 
simplify the solving phase, effectively transferring part of the total time from the second 
phase to the first one. 

In this section we will study statistical characteristics of linear systems produced by the 
QS algorithm. Next subsection is dedicated to a schematic description of QS. 



A. The QS algorithm 



For a nice description of the QS algorithm see flip] . Synthetically, QS works at follows. 
It builds a list of integer numbers {yi}i^i such that: 

• Hi = x 2 (modn) for some x« and yi ^ xf, 

• y-i is completely factorizable in a given (relatively small) subset of B primes called the 
factor-base. 

This is called the sieving phase. The algorithm then searches a subset J C / of elements 
of the list such that Yii&jyi = z 2 is a square (solving phase). Once found, z 2 = x 2 (modn) 
(here x = Iligj x i) an d this implies that n divides (x + z)(x — z) and then gcd(x — z, n) will 
likely (further trials will increase the probability) be a non-trivial factor of n. 



1. Sieving 

In order to find element pairs Xi,yi such that y,i = x 2 (modn) we can use the polynomial 
y = f(x) = x 2 — n and evaluate it at different values of x, keeping only values of y which 
completely factorize between the first B primes (the factor-base). The sieving will allow us 
to do this efficiently 

The idea is that, given p, it is easy to find which are the values of f(x) which are divisible 
by p, because p divides f(x) if and only if f(x) = x 2 — n = (modp) and this is a quadratic 
equation in GF(p), having at most 2 solutions. These solutions are nothing but the square 
roots of n modulo p (if they exist). 

This has a first consequence, i.e. that a prime p will not divide f(x) if n is not a square 
modp independently of the value of x. So if we can detect these primes, we can eliminate 
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them directly from our set of primes. Detecting them is very easy: Using Fermat's little 
theorem, we know that 



assuming that p do not divide n (which is trivially a reasonable assumption, anyway, because 
we are searching a divisor of n). If p is an odd prime, i.e. not 2 (all n are a squares (mod 2)), 



to be handy. 

If n = s 2 (modp) then m = s p_1 = l(modp). Conversely, if m = 1 then n is a square 
modulo p (not proven here). The number m is called the Lagrange symbol and can be 
computed efficiently in one of the firsts stages of the algorithm. Useful primes (those with 
m — 1) are roughly a random half of all the first B primes. So now we will keep only this 
half and redefine "the first B primes" as "the first B primes with m — 1". 

Computing the square root modulo p is a bit more difficult than knowing that it exists, 
but can also be done efficiently. For instance, the easiest case is when is an integer, then 
(n^ - ) = rfi*~ = inn (modp). As m = 1 (or else there is no solution) then in 2 ^ - (modp) 
are the required square roots. 

Once we have computed the two solutions f(Xp' 2 ) = (modp), then adding p, 2p, 3p, . . . 
to them we will obtain all x such that f(x) is divisible by p. 

The sieve idea is to initialize an array with values of f(x) for consecutive x G [[v^L W™\ + 
M] indexed by x, and then for each p in our factor base to divide the corresponding arithmetic 
progression of {f{x^ 2 + kp), k — 1, . . .} by p. At the end those values which are completely 
factored between the primes in the factor base will become 1 (Well, not exactly. Some of 
them can have multiple times the same prime factor. But we can set up a threshold instead 
of 1 below which we consider the number completely factored. We can recheck afterwards). 
We take those values and put their factorization in an array 




(17) 



then calling m = n P 2 we have that m 2 = 1 (modp), so m = ±1 (modp). This m will prove 



Pi 




(mod 2) 



a B ■■■ a B 



(m) 
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2. Solving 



The solving phase is conceptually simple: A solution of the homogeneous linear system 
Av = is a {0, 1} vector v which represent correctly the subset J, in the sense that v « = lif 
and only if i G J. 



B. The matrix ensemble 



1. Correlations 



We have implemented the simplest QS described in [15[ in order to analyze the output 
matrix ensemble. We attempted to look for correlations in the presence/absence of different 
primes in the set of divisors of the variables yi. Specifically we checked that there is virtually 
no correlation between rows of the matrix: We have taken one such output matrix (resulting 
from the factorization of a product of two 20 digits primes) and computed the covariance 
between the corresponding spin variables s\, S2 of two rows r\,r2, the averages being taken 
along different columns, 

(sis 2 ) - (si)(s 2 ) ■ 

Once repeated for all r± < r 2 , we found that all pairs have correlations in the interval 0±0.06, 
a proportion of 0.9999 pairs having correlations in ± 0.02. 



2. Dependence on "factorization hardness" 

We then examined dependence of the resulting distributions of ones per row on the 
"factorization hardness" of the number n. Typically (depending on the algorithm) the 
complexity of factorization depends on the size of the smallest prime divisor of n ||17|| : For 
instance, trial division ends in exactly this amount of steps. It was conjectured that this 
would be reflected in the structure of the output matrix. 

We have constructed 25 numbers n with different factor sizes (from now on, factor type 
10+10+10 will mean a 30 digit number constructed as a product of tree 10-digit primes) 
organized as follows: 

• 5 of type 20+20 
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FIG. 5: Probability distribution of the number of ones per row in 5 different matrices of size around 
1300. The line is the best power law fit on X > 3 data, giving an exponent ~ —2.2. 

• 5 of type 10+30 

• 5 of type 13+13+13 

• 5 of type 10+10+10+10 

• 5 of type 5+5+5+5+5+5+5+5 

All 25 numbers differed between them in less than a 0.01%. We then made QS compute the 
factorization matrices, with a factor base of size 1500. This value for the size of the factor 
base has been chosen experimentally in order to minimize the sieving phase duration. The 
resulting matrices were of size 1500 x 1510 and were then post-processed in order to remove 
rows and columns with a single 1. The final size is thus reduced of about 200 columns and 
rows. The resulting distributions of ones per row are plotted in Fig. [|, showing very little 
variations. They can be very well described by a unique distribution, which is substantially 
a power law with some little deviations in the range of type-2 and type-3 rows. The best fit 
in the region X > 3 gives an exponent ~ —2.2. 

Our conclusion is that statistical properties of the resulting matrix do not depend on 
the factorization hardness. The bottleneck for factorizing a large hard number is mainly 
determined by the time required by QS to generate the matrix, which indeed strongly 
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depends on the size of the smallest factor. In the rest of the paper we will analyze the 
solving phase, assuming the factorization matrix to have uncorrelated rows and the number 
of ones per row to be a random variable extracted from distribution in Fig. [5[ These two 
assumptions have been experimentally verified. 

C. Linear solving methods 

Plain standard Gaussian elimination execution time is cubic in the size of the matrix 
(our matrices are almost square). Fortunately, we can pack 32 matrix entries in a single 
4-byte word, and then the sum operation is implemented as the low-cost bit-wise logical 
XOR operation, saving a factor 32 in time. 

As also the matrix is very sparse, instead of keeping in memory all of it, we can memorize 
only the position of l's. This forbid us to use the factor-32 trick, but allows us to do the 
first steps very quickly. At some time in the Gaussian elimination process (typically more 
than half of the process), the remaining (non eliminated) part will be very dense, and then 
it will be convenient to switch to the standard method above. This is what was done in the 
solving phase of RSA129. 

Another option is to use in one of the stages an iterative algorithms, like the discrete 
Lanczos. The Lanczos method has the advantage of having a stable 0(N 2 ) total time for 
a sparse matrix, but finds only one solution (or a prefixed quantity in the block-Lanczos 
variant) instead of all of them. For factorization this is not a problem, because we need only 
a few solutions to have a reasonable chance. This is the method that was used in the solving 
phase of RSA155. 

D. Power law distributed {a^}: Phase diagram and comparison with real applica- 
tion data 

The previous analysis leads to the construction of matrices whose density of non zero 
entries follows quite well a power law distribution with light deviations due to rows with 
a small number of ones and a cutoff, k max , of some hundreds. Then we use the following 
distribution in the analytical treatment: 

a,2 = a , (18) 
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FIG. 6: Phase diagram (a, 7) for a typical choice of s = 2.2 and k max = 200. The bold line l/(2a) 
represents a continuous transition, while 7d(a) and 7 c (a) corresponds respectively to the spinodal 
and the critical lines of a first order transition. The dot marks the origin of these lines. 

a k = e k~ s for 3 < k < k max , (19) 

where e is a normalizing factor equal to (1 — a) / Ylk=T k~ s ■ The factorized integers considered 
in the previous section lead to an exponent s ~ 2.2 and to a non zero support up to 
kmax ~ 200. The choice of keeping and only 02, as an independent parameter is dictated 
by the very difference in the physical behavior of 2-spin terms and /c-spin terms with k > 2. 

The study of the phase diagram in the control parameter 7 for choices of a, s and k max 
retrieved from real data reveals a non trivial behavior. 

In Fig. H we show the phase diagram for s = 2.2 and k max = 200. Only part of the entire 
phase diagram (a G [0, 1], 7 G [0, 1]) is shown for clarity. The lines further go on smoothly 
outside the drawn portion. 

If a is high enough, we are in the rightmost region I of the phase diagram, where algo- 
rithms smoothly find solutions to the system and do not undergo any critical slowing down. 
Indeed, crossing the bold iperbole 7 = l/(2a) given by the condition dG(m)/dm\ m=0 = 0, 
the system undergoes a continuous transition in the order parameter m, representing the 
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FIG. 7: Dependence on s and k max of the origin of first order critical lines. The bold curve is 
the continuous phase transition 7 = l/(2o). Each solid bell-shaped curve in the left plot is the 
ensemble of such origins, defined as the point where, decreasing a, another non trivial solution to 
the saddle point equations appears. Each curve from right to left is indexed by a different value 
of kmax = 10, 30, 100, 200, 1000, 2000, 10000. Each point on the curve corresponds to a particular 
value of s (the dot is for s = 2.2 and k max = 200 as in Fig. |^). Along the curve s increases for 
decreasing 7 (see right plot). From each point of the curve originate the two first order critical 
lines shown for s = 2.2 and k max = 200 in Fig. ||, and pictorially drawn for different s values in 
the right plot. When the origin joins the second order iperbole the system is at a tricritical point, 
^tricriticai scales very rapidly with k max converging to ~ 2.73 — 2.74 already for k max ~ 100. 

fraction of variables taking the same value in all the solutions. The problem of finding 
solutions is always easy, as for the case {02 = 1; a>k^2 — 0} explained before. 

Decreasing a we meet a first intermediate region II, where the birth of a meta-stable 
non-trivial saddle-point solution at 7 = 7^(0) is given by the solution of Eq. (|8|). However, 
algorithms should not be much affected by this meta-stable state, because the system starts 
magnetizing continuously before, crossing the bold line. Increasing 7 up to the critical value 
7 c (a) one meets a first order transition, where the magnetization, that was already non-zero, 
undergoes a further jump. 

The second central region III shows an inversion between 7<j(a) and the bold line l/(2a). 
These two intermediate regions have not been exhaustively studied yet, because real data all 
fall in the leftmost one. The shape of the central part of the phase diagram is very sensitive 
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to the choice of the control parameters s and k max , as shown in Fig. [7[ 
The 7 c (a) curve in the second and third regions is found solving 

S{m\ lc ) = S(m*, lc ) , (20) 

where m* is the smallest positive solution to G(m) = 0, which corresponds to the magnetiza- 
tion of the ferromagnetic state arisen from the second order transition (bold line). The points 
of crossing showing the onset of different regions, from right to left, are found respectively as: 
¥ = 0&5K 7 ) = %„7), T = 0&7 = ^ and S(m*, 7) = S(0, 7) & 7 = 

The leftmost part IV shows the typical behavior described in || . Increasing 7 the system 
never reaches the continuous transition on the bold line, but it undergoes a first dynamical 
transition at 7<j(a) and second thermodynamical one at 7 c (a), found via Eq. (|20D with m* = 
since we are still below the second order transition line. Configurational entropy is non-zero 
between 7<j(a) and 7 c (a), and solving algorithms are affected by it. There are typically other 
spinodal lines in the phase diagram, but they always correspond to sub-optimal solutions, 
and were, therefore, not shown in the picture. 

In real data the fraction of 2-variables equations is typically of the order of 0.2 and 
7 ~ 1. So we always work deep into phase IV where, during the solving procedure, the 
system undergoes a first dynamical transition, that corresponds to a slowing down of the 
solving algorithms, before finding solutions. 



V. CONCLUSIONS 



We have analyzed the behaviour of different type of polynomial algorithms in the solutions 
of large-scale linear systems over finite fields. The connection between memory requirements 
and clustering phase transitions as been made clear on both artificially generated problem 
as well as on a "real-world" applications. While the role of the dynamical glass transition in 
local search algorithm was already well known (trapping in local minima), we have provided 
a clear example of the role of such type of glass transition in global dynamical processes which 
are guaranteed to converge to the global optimum in some polynomial time. The memory 
catastrophe found is such cases constitutes a concrete limitation for the performance of 
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single-machine programs. 
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